Skip to content

refactor: add try_to_proto to HashTableLookupExpr#22451

Open
AnuragRaut08 wants to merge 5 commits into
apache:mainfrom
AnuragRaut08:feat/port-hash-table-lookup-expr-proto
Open

refactor: add try_to_proto to HashTableLookupExpr#22451
AnuragRaut08 wants to merge 5 commits into
apache:mainfrom
AnuragRaut08:feat/port-hash-table-lookup-expr-proto

Conversation

@AnuragRaut08
Copy link
Copy Markdown

Which issue does this PR close?

Part of #22435

What changes are included?

Adds try_to_proto to HashTableLookupExpr so it participates in the
expression-local serialization pattern introduced in #21929.

HashTableLookupExpr holds a runtime Arc<Map> that cannot be
serialized, so try_to_proto replaces it with lit(true). This is
safe because the filter is a performance optimisation only — lit(true)
passes all rows and the join produces correct results either way.

The centralized arm in to_proto.rs remains as a fallback for now.
Cleanup can follow in a separate PR once this lands.

Are these changes tested?

Yes — covered by the existing roundtrip_hash_table_lookup_expr_to_lit
test in datafusion/proto/tests/cases/roundtrip_physical_plan.rs.

Are there any user-facing changes?

No.

Part of apache#22435. Adds try_to_proto to HashTableLookupExpr so it
participates in the expression-local serialization pattern introduced
in apache#21929.

HashTableLookupExpr holds a runtime Arc<Map> that cannot be serialized,
so try_to_proto replaces it with lit(true). This is safe because the
filter is a performance optimisation only — lit(true) passes all rows
and the join produces correct results either way.

The centralized arm in to_proto.rs remains as a fallback for now.
@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label May 22, 2026
@AnuragRaut08
Copy link
Copy Markdown
Author

Thanks! CI is currently running. Please let me know if any adjustments are needed — happy to iterate quickly.

/// - `lit(true)` passes all rows so no valid rows are lost.
/// - In distributed execution the remote worker has no access to the
/// build-side hash table anyway.
pub fn try_to_proto(
Copy link
Copy Markdown
Contributor

@jayshrivastava jayshrivastava May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this must be moved to the impl PhysicalExpr for HashTableLookupExpr block to actually be used.

The way to test this code is being used is to delete the old paths in datafusion/datafusion/proto/src/physical_plan/to_proto.rs and datafusion/datafusion/proto/src/physical_plan/from_proto.rs.

Since this is a small PR, you may want to considering adding try_from_proto as well :)

/// - The filter is a performance optimisation, not a correctness requirement.
/// - `lit(true)` passes all rows so no valid rows are lost.
/// - In distributed execution the remote worker has no access to the
/// build-side hash table anyway.
Copy link
Copy Markdown
Contributor

@jayshrivastava jayshrivastava May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment comes from roundtrip_hash_table_lookup_expr_to_lit() in /datafusion/datafusion/proto/tests/cases/roundtrip_physical_plan.rs but I don't think it's very true.

IIUC, hash joins might build a HashTableLookupExpr during execution after the build side is done. These expressions get placed in the dynamic filter expr.

If you serialize before executing the plan, then there's no code path where there would be a HashTableLookupExpr in the plan today. If you deserialize and execute that plan, then the HashJoinExec may create a fresh HashTableLookupExpr for the dynamic filter. In this case, all the row pruning is preserved.

If you serialize after executing, then any potential HashTableLookupExpr would be replaced with lit(true). I don't think this has any impact. One must call reset_state to re-execute plans, in which case I would expect the HashTableLookupExpr to disappear. In this case, all the row pruning is preserved as well.

Maybe we can change the comment to explain these two cases? ^

Since HashTableLookupExpr is public it might be good to warn users that it does not get serialized. We could (a) file a ticket to track that HashTableLookupExpr are not serialized and (b) add a comment directly on HashTableLookupExpr.

Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnuragRaut08
Thanks for working on this.
I agree with @jayshrivastava's point about moving this into the trait impl, and I also noticed a separate build issue around the proto wiring that looks blocking before merge.

/// - `lit(true)` passes all rows so no valid rows are lost.
/// - In distributed execution the remote worker has no access to the
/// build-side hash table anyway.
pub fn try_to_proto(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building on Jay's comment, I think this needs to live in impl PhysicalExpr for HashTableLookupExpr rather than as an inherent HashTableLookupExpr::try_to_proto method.

The serializer calls expr.try_to_proto(&ctx) through &dyn PhysicalExpr, so this implementation is not reached. Serialization still depends on the centralized downcast fallback in datafusion/proto/src/physical_plan/to_proto.rs.

Could you move this into the PhysicalExpr impl with the same #[cfg(feature = "proto")] signature? That should make the expression-local serialization path actually take effect.

/// - `lit(true)` passes all rows so no valid rows are lost.
/// - In distributed execution the remote worker has no access to the
/// build-side hash table anyway.
pub fn try_to_proto(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also seeing a separate build issue here. This new signature references physical_expr::proto_encode, datafusion_proto_common, and datafusion_proto_models unconditionally, but datafusion-physical-plan does not expose those dependencies in a normal build.

cargo check -p datafusion-physical-plan currently fails with unresolved imports for proto_encode, datafusion_proto_common, and datafusion_proto_models.

Could you gate this consistently with the existing PhysicalExpr::try_to_proto API and wire the dependencies through the appropriate proto feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants